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Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging pathogen, first recognized in 2012, 
with a high case fatality risk, no vaccine, and no treatment beyond supportive care. We estimated the relative risks of 
death and severe disease among MERS-CoV patients in the Middle East between 2012 and 2015 for several risk 
factors, using Poisson regression with robust variance and a bootstrap-based expectation maximization algorithm 
to handle extensive missing data. Increased age and underlying comorbidity were risk factors for both death and 
severe disease, while cases arising in Saudi Arabia were more likely to be severe. Cases occurring later in the emer¬ 
gence of MERS-CoV and among health-care workers were less serious. This study represents an attempt to esti¬ 
mate risk factors for an emerging infectious disease using open data and to address some of the uncertainty 
surrounding MERS-CoV epidemiology. 

coronaviruses; emerging infections; MERS-CoV; Middle East respiratory syndrome coronavirus; respiratory 
infections; zoonotic infections 


Abbreviation: MERS-CoV, Middle East respiratory syndrome coronavirus. 


Middle East respiratory syndrome coronavirus (MERS-CoV) 
is a stage 3 zoonosis that has been reported in 26 countries, in¬ 
cluding the United States (1, 2). The vims was first recognized 
in Saudi Arabia in 2012, though it may have been circulating in 
the region much longer (3, 4). As of August 18, 2015, there 
have been 1,413 confirmed cases and 502 deaths (5). The 
virus causes severe respiratory illness in humans and has a 
mortality rate of 30%—40% (6). Treatment for MERS-CoV 
cases is limited to supportive care. 

Certain groups may be at higher risk of contracting the vims 
or of having their cases ascertained due to illness severity, in¬ 
cluding males and those with comorbid medical conditions, 
such as diabetes and heart disease. Common symptoms in¬ 
clude fever, cough, shortness of breath, chest pain, and diar¬ 
rhea (7, 8). The vims is probably transmitted from camels to 
humans, and stuttering chains (groups of cases linked by a 
continuous chain of transmission events that arise periodi¬ 
cally) of human-to-human transmission are also possible (4, 
7-10). Human-to-human transmission occurs between 2 peo¬ 
ple in close contact, a circumstance common in households 


and health-care settings. Early identification and isolation of 
cases is critical for limiting spread of the virus. 

Information on the epidemiology of MERS-CoV has been 
limited to date. Prior work on the 2013 influenza (A)H7N9 out¬ 
break found that line listings of cases aggregated from publicly 
available sources like media and public health reports compare 
favorably to official line listings (11). These public line listings 
can be used to gain insight into an ongoing outbreak in a timely 
manner, as official data tend to be released only after outbreaks 
are over. Real-time analyses are vital to planning and imple¬ 
menting effective public health control measures to prevent 
the spread of the disease. We used publicly available data to 
evaluate the risks of death and severe disease among patients 
with MERS-CoV. 

METHODS 
Data sources 

A publicly accessible line listing of MERS-CoV cases, 
maintained by Dr. Andrew Rambaut and available online 
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(12), was accessed on August 4, 2015. This line listing con¬ 
tained 1,291 cases of MERS-CoV infection pulled from a 
number of sources, including the World Health Organization 
and the government of the Kingdom of Saudi Arabia. This 
data set has often been more up-to-date than official World 
Health Organization case reports, especially early in the epi¬ 
demic. The outcomes available are as reported and do not nec¬ 
essarily reflect the final status of the patient after prolonged 
follow-up, so some misclassification of outcomes is possible. 
The majority of MERS-CoV cases occurred in Saudi Arabia, 
South Korea, and the United Arab Emirates (Appendix Table 1 ). 
The outbreak in South Korea was excluded from the analysis 
because of its unique nature, resulting in 1,105 cases after 
exclusion. 

Exposure definition and covariate selection 

Outcomes of interest were death and severe disease. The sta¬ 
tus of the patient as either alive or deceased was determined by 
whether or not the patient had died at the time of initial report¬ 
ing. Patients with severe disease were considered those who 
had either died from their infection or required critical care 
at the time of initial reporting, as opposed to those who expe¬ 
rienced few or less serious complications. 

Risk factors considered were the patient’s age, the date of 
onset of the infection, the presence or absence of any under¬ 
lying comorbidity such as cardiac or renal disease, reported 
contact with camels or other animals, whether or not the pa¬ 
tient was employed as a health-care worker, whether or not 
the case was a primary or secondary case (based on reported 
contact with an existing case), whether or not the case arose in 
Saudi Arabia (the nation in which the majority of cases orig¬ 
inated), the patient’s sex, the number of days since January 1, 
2012, and the time between onset of infection and subsequent 
hospitalization. 

Missing data 

Because of the emerging nature of the disease, the widely 
varying sources from which the case reports were drawn, dif¬ 
ficulty in case ascertainment, and sparse reporting, the data set 
used (12) had extensively missing data. There were 920 cases 
with missing information on 1 or more variables (including 
outcome variables), making conventional complete-case anal¬ 
ysis essentially impossible. Because there was no evidence 
that these cases were missing data completely at random, esti¬ 
mates could be biased. 

We used a bootstrap-based expectation maximization 
method to multiply impute the missing information (13). 
One hundred imputations were used, based on the assump¬ 
tion that all data for the variables included in the analysis, 
missing or observed, came from a multivariate normal distri¬ 
bution. A ridge prior of 1% of the empirical data was used to 
assist with the numerical stability of the algorithm. The ridge 
prior in essence adds an additional number of observations 
equal to 1% of the data set with the same mean and variance 
as the observed data, but with no covariance. This shrinks the 
covariance between the variables in the imputation model and 
assists the algorithm in converging on a stable solution, which 
is sometimes necessary with high degrees of missingness, as in 


this case. Priors using 0.5% of the data or 2% of the data did 
not result in meaningful differences in the results (not shown). 

Regression models 

Poisson regression models using a robust variance estimator 
(14) were used to estimate the univariate relative risk of either 
outcome according to each potential risk factor. These models 
are comparable to those obtained using binomial regression, 
though often more computationally tractable. Those variables 
that were moderately associated (P < 0.20) with the outcome 
were included in a multivariate risk model. All analysis was 
performed with the R statistical programming language (R 
Foundation for Statistical Computing, Vienna, Austria) using 
the Amelia2 package for multiple imputation (15). 

Human subjects approval 

Because this work used entirely publicly available informa¬ 
tion with no personal identifiers, it was determined to not re¬ 
quire approval by an institutional review board. 

RESULTS 

Demographic characteristics 

The distribution of patient ages for both fatal and nonfatal 
cases is shown in Figure 1. The distributions of other variables, 
including the numbers of missing values, are reported in 
Table 1. 

Risk factors for reported mortality 

The estimated relative risk of death and corresponding 
95% confidence intervals for the covariates described in the 
Methods section are shown in Table 2. As with any emerging 
infection, both the presence and the absence of associations 
with putative risk factors warrant reporting. Univariate analy¬ 
sis showed that reported contact with camels or other animals, 
cases occurring in Saudi Arabia, and case type (a case’s being 
primary vs. secondary) were not associated with reported mor¬ 
tality. Employment as a health-care worker and an increased 
amount of time between disease onset and hospitalization 
had minor protective associations with reported mortality. 
Older age and underlying comorbidity were associated with 



Figure 1 . Gaussian kernel-smoothed age distributions of fatal and 
nonfatal cases of Middle East respiratory syndrome coronavirus from 
2012 to 2015. 
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Table 1 . Demographic and Risk Factor Characteristics of Patients With Reported Middle East Respiratory Syndrome 
Coronavirus Infection, 2012-2015 


Variable 


All Patients 


Severe Cases 

No. 

% 

Mean (SD) 

No. 

% 

Mean (SD) 

Age, years 



50 (18) 



57(17) 

Missing data 

11 

1.0 


11 

2.1 


Time of onset (days since January 1,2012) 



911 (255) 



881 (277) 

Missing data 

461 

41.7 


163 

31.8 


Underlying comorbidity 







Yes 

565 

51.1 


361 

70.4 


No 

526 

47.6 


143 

27.9 


Missing data 

14 

1.3 


9 

1.8 


Reported animal contact 







Yes 

105 

9.5 


53 

10.3 


No 

278 

25.2 


146 

28.5 


Missing data 

722 

65.3 


314 

61.2 


Reported camel contact 







Yes 

84 

7.6 


41 

8.0 


No 

233 

21.1 


117 

22.8 


Missing data 

788 

71.3 


355 

69.2 


Health-care worker 







Yes 

168 

15.2 


38 

7.4 


No 

351 

31.8 


189 

36.8 


Missing data 

586 

53.0 


286 

55.8 


Case type 







Primary 

216 

19.5 


130 

25.3 


Secondary 

484 

43.8 


151 

29.4 


Missing data 

405 

36.7 


232 

45.2 


Case origin 







Saudi Arabia 

959 

86.8 


457 

89.1 


Other country 

146 

13.2 


56 

10.9 


Missing data 







Sex 







Male 

736 

66.6 


370 

72.1 


Female 

346 

31.3 


132 

25.7 


Missing data 

23 

2.1 


11 

2.1 


Delay in hospitalization, days 



4.91 (4.41) 



3.80 (4.39) 

Missing data 

577 

52.2 


216 

42.1 



Abbreviation: SD, standard deviation. 


increased risks of mortality, while female patients and cases 
with a later time of infection onset (in days since January 1, 
2012) had lower risks of mortality. Upon multivariate adjust¬ 
ment, most of the estimated associations were attenuated, and 
neither female sex nor time between disease onset and hospi¬ 
talization remained an independent risk factor. 

Risk factors for reported severe disease 

The estimated relative risks of severe disease and corre¬ 
sponding 95% confidence intervals are shown in Table 2. 


Reported contact with camels or other animals, regardless 
of whether or not the case arose in Saudi Arabia, and longer 
delays between disease onset and hospitalization were not as¬ 
sociated with an increased risk of severe disease. Increased 
age and the presence of underlying comorbidity were associ¬ 
ated with an increased risk of severe disease. Female sex, hav¬ 
ing a secondary case, having a case arising later in time, and 
employment as a health-care worker were protective against 
severe disease. 

As with the risk of reported death, the multivariate associa¬ 
tions were largely attenuated from the univariate associations, 
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Table 2. Estimated Relative Risks of Death and Severe Disease at the Time of Reporting for Patients With Middle 
East Respiratory Syndrome Coronavirus, 2012-2015 


Variable 


Death 



Severe Disease 

RR 

95% Cl 

aRR a 

95% Cl 

RR 

95% Cl 

aRR b 

95% Cl 

Age 

1.02 

1.02, 1.03 

1.01 

1.00, 1.02 

1.02 

1.02, 1.02 

1.01 

1.01, 1.01 

Time of onset 0 

1.00 

1.00, 1.00 

1.00 

1.00, 1.00 

1.00 

1.00, 1.00 

1.00 

1.00, 1.00 

Underlying comorbidity 

2.51 

1.87, 3.37 

1.99 

1.39, 2.86 

2.23 

1.93, 2.46 

1.65 

1.39, 1.97 

Animal contact 

1.16 

0.74, 1.80 



1.10 

0.89, 1.35 



Camel contact 

1.19 

0.73, 1.93 



1.10 

0.89, 1.37 



Health-care worker 

0.52 

0.33, 0.81 

0.46 

0.28, 0.75 

0.49 

040, 0.60 

0.61 

0.48, 0.79 

Secondary case 

0.84 

0.60, 1.18 



0.60 

0.52, 0.70 

0.82 

0.69, 0.97 

Saudi Arabia 

0.85 

0.60, 1.21 



1.18 

0.95, 1.45 

1.24 

1.02, 1.52 

Female sex 

0.75 

0.56, 1.00 

0.93 

0.70, 1.25 

0.77 

0.66, 0.89 

0.92 

0.81,1.06 

Hospitalization delay d 

0.85 

0.81,0.89 

0.99 

0.95.1.03 

0.99 

0.97, 1.01 




Abbreviations: aRR, adjusted relative risk; Cl, confidence interval; RR, relative risk. 

a Multivariate model that adjusted for age, presence of comorbidity, reported contact with animals, health-care 
worker status, case type (primary vs. secondary), and patient sex. 

b Multivariate model that adjusted for age, time of onset, presence of comorbidity, health-care worker status, case 
type (primary vs. secondary), and patient sex. 
c Days since January 1,2012. 

d Reported number of days between onset and subsequent hospitalization. 


and notably, female sex was no longer protective once other 
variables had been controlled for. As compared with the risk 
of death, the estimated associations for severe disease were fre¬ 
quently closer to the null. 

DISCUSSION 

The emergence of a novel infectious disease presents a par¬ 
ticular challenge to timely epidemiologic research, as the ex¬ 
istence of sparse and irregularly collected data competes with 
the need to identify risk factors associated with the disease 
and its outcomes. A dearth of openly shared data impedes re¬ 
search efforts, such as the construction of mathematical mod¬ 
els or broader-scale risk assessments. We have attempted to 
address this for MERS-CoV, using a regularly updated, pub¬ 
licly available data set. The use of multivariate models with 
allowance for extensively missing data has allowed the iden¬ 
tification of some previously suggested risk factors that do 
not appear to be so upon adjustment for other covariates. 
For example, female patients were not necessarily at lower 
risk for disease after adjustment, nor were primary cases at 
higher risk for fatal infections. Issues of data quality and 
“missingness” during outbreaks necessitate the use of robust 
techniques for handling missing data. 

We found that older age and underlying comorbidity were 
associated with increased risks of both death and severe dis¬ 
ease. While not a surprising finding, this does suggest that 
older and sicker patients merit heightened vigilance. Addi¬ 
tionally, cases arising progressively later during the epidemic 
have been associated with lower risks of both death and se¬ 
vere disease at the time of initial reporting, suggesting that 
treatment methods for MERS-CoV may be increasing in ef¬ 
ficacy. Alternately, the proportion of mild and asymptomatic 
cases has been rising over time, suggesting that less severe 


cases are becoming more likely to be ascertained as a result 
of epidemiologic investigation. This is supported by temporal 
trends in the missingness of the data, which grows less severe 
later in the epidemic. 

This study was not without limitations, especially those 
stemming from the data used. Patient outcomes were identi¬ 
fied at the time of reporting, rather than based on follow-up, 
so it is possible that some patients counted as living or with¬ 
out severe disease may have experienced serious or fatal 
complications after reporting, which would not have been re¬ 
corded in the data. There is also the possibility of unmeasured 
confounding biasing these estimates or the multiple imputa¬ 
tion model not fully addressing the missingness within the 
data set. These issues are unlikely to be resolved without 
more resource-intensive population-based studies. 

Despite these shortcomings, the study represents an at¬ 
tempt to quantify the known risk factors for MERS-CoV 
using the best available and open data. While the estimates 
are imperfect, they are superior to univariate associations 
that do not control for confounding, or allowing paralysis 
in the face of difficult and imperfect data to deprive public 
health planners of potentially useful information. These esti¬ 
mates can and should be revised as more becomes known 
about the disease, but for the moment, they represent the cur¬ 
rent state of our knowledge about MERS-CoV and its impact 
on human health outcomes. 
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Appendix Table 1 . Locations of known cases of Middle East 
respiratory syndrome coronavims as of August 4, 2015 a 


Country 

No. of Cases 

France 

1 

Iran 

8 

Italy 

2 

Jordan 

20 

Saudi Arabia 

959 

Kuwait 

3 

Lebanon 

1 

Omar 

9 

Qatar 

15 

South Korea 

186 b 

Tunisia 

2 

United Arab Emirates 

77 

United Kingdom 

2 

Yemen 

1 

Missing 

5 

Total 

1,291 


a Data were obtained from a publicly accessible line listing of cases 
maintained by Dr. Andrew Rambaut (12). 

b Cases from South Korea were excluded from the current 
analysis. 
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